[1] 24 25 26 19 24 30 27 29 26 27 28 26 27 20
Motivation, Foundations, Reading Closely and Common Errors
delivered at Azim Premji University, Bhopal
2025-10-22
“The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning” - Nate Silver
“मेरे इज़हार से पहचान मिली है वरना
लॅफ्ज़ अवारा फिरा करते थे मानी के लिए” - मदन मोहन दानिश
This is my EVIL Plan:
Following is the number of time I day dream about food, recorded over two weeks:
[1] 24 25 26 19 24 30 27 29 26 27 28 26 27 20
Here is the number of times I tried to complete this presentation over the same period of time:
[1] 26 24 29 14 26 28 21 28 28 35 31 19 24 19
It is not easy and intuitive to look at number and say something about it. Even if summary stats are provided. Let us look at this more closely.
x1 x2 x3 x4 y1 y2 y3 y4
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
3 13 13 13 8 7.58 8.74 12.74 7.71
4 9 9 9 8 8.81 8.77 7.11 8.84
5 11 11 11 8 8.33 9.26 7.81 8.47
6 14 14 14 8 9.96 8.10 8.84 7.04
7 6 6 6 8 7.24 6.13 6.08 5.25
8 4 4 4 19 4.26 3.10 5.39 12.50
9 12 12 12 8 10.84 9.13 8.15 5.56
10 7 7 7 8 4.82 7.26 6.42 7.91
11 5 5 5 8 5.68 4.74 5.73 6.89
# A tibble: 4 × 3
variable mean sd
<chr> <dbl> <dbl>
1 y1 7.50 2.03
2 y2 7.50 2.03
3 y3 7.5 2.03
4 y4 7.50 2.03
Corr x1, y1: 0.8164205
Corr x2, y2: 0.8162365
Corr x3, y3: 0.8162867
Corr x4, y4: 0.8165214
Anscombe’s quartet-from Data Visualization by Healy
A record of any measurement of interest.
Can you come up with any fun examples?
| Govt Primary School (Status A(1)/NA(2)) | n | percent | valid_percent |
|---|---|---|---|
| 1 | 17379 | 95.4% | 97.4% |
| 2 | 464 | 2.5% | 2.6% |
| NA | 382 | 2.1% | - |
| Total | 18225 | - | - |
| Govt Primary School (Status A(1)/NA(2)) | a | b | c | NA_ |
|---|---|---|---|---|
| 1 | 0.0% (0) | 0.0% (0) | 0.0% (0) | 100.0% (17,379) |
| 2 | 49.4% (229) | 31.7% (147) | 16.4% (76) | 2.6% (12) |
| NA | 0.0% (0) | 0.0% (0) | 0.0% (0) | 100.0% (382) |
| Total | 49.4% (229) | 31.7% (147) | 16.4% (76) | 202.6% (17,773) |
Figure 2: Acess to Primary Health Centre in Gujarat - Stacked
Figure 3: Acess to Primary Health Centre in Gujarat - Grouped
Figure 4: Acess to Primary Health Centre in Gujarat - Proportion
| Hospital | Operations | Survivors | Deaths | ThirtyDaySurvival | PercentageDying |
|---|---|---|---|---|---|
| London - Harley Street | 418 | 413 | 5 | 98.8 | 1.2 |
| Leicester | 607 | 593 | 14 | 97.7 | 2.3 |
| Newcastle | 668 | 653 | 15 | 97.8 | 2.2 |
| Glasgow | 760 | 733 | 27 | 96.3 | 3.7 |
| Southampton | 829 | 815 | 14 | 98.3 | 1.7 |
| Bristol | 835 | 821 | 14 | 98.3 | 1.7 |
| Dublin | 983 | 960 | 23 | 97.7 | 2.3 |
| Leeds | 1038 | 1016 | 22 | 97.9 | 2.1 |
| London - Brompton | 1094 | 1075 | 19 | 98.3 | 1.7 |
| Liverpool | 1132 | 1112 | 20 | 98.2 | 1.8 |
| London - Evelina | 1220 | 1185 | 35 | 97.1 | 2.9 |
| Birmingham | 1457 | 1421 | 36 | 97.5 | 2.5 |
| London - Great Ormond Street | 1892 | 1873 | 19 | 99.0 | 1.0 |
from: The Art of Statistics
The example mentions 18%. Sure sounds a big enough proportion to worry about getting bowel cancer. BUT, this is not Absolute difference in risk. This is relative risk to people who consume bacon every day.
Village population across districts
Distance to closest town
Comparing population and geographical area of villages
Napolean’s retreat from Russia by Minard-from Data Visualization by Healy
`Monstrous Costs’ by Nigel Holmes-from Data Visualization by Healy
Rainfall in Glasgow and Edinbrugh-from Cara Thompson’s More than pretty graphs
Rainfall in Glasgow and Edinbrugh-from Cara Thompson’s More than pretty graphs
“Graphical excellence is the well-designed presentation of interesting data—a matter of substance, of statistics, and of design … [It] consists of complex ideas communicated with clarity, precision, and efficiency. … [It] is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space … [It] is nearly always multivariate … And graphical excellence requires telling the truth about the data. (Tufte, 1983, p. 51).”
reference to NYT graph-from Data Visualization by Healy
The following problems are distinct but can appear in various combinations in a given figure.
reference to NYT graph-from Data Visualization by Healy
Voeten’s response to NYT graph-from Data Visualization by Healy
Healy’s examples for possible manipulations from Data Visualization by Healy
Healy’s examples for possible manipulations from Data Visualization by Healy
Healy’s Alternative to the index vs money base chart from Data Visualization by Healy
Bin width example from Fundamentals of Data Visualization by Wilke
Incorrect data representation example from Fundamentals of Data Visualization by Wilke
Multiple Distribution common error from Fundamentals of Data Visualization by Wilke
Multiple Distribution common error from Fundamentals of Data Visualization by Wilke
for multi variable distribution
Alternatives for Multiple Distribution from Fundamentals of Data Visualization by Wilke
for multi variable distribution
Alternatives for Multiple Distribution from Fundamentals of Data Visualization by Wilke
for multi variable distribution
Alternatives for Multiple Distribution from Fundamentals of Data Visualization by Wilke
for multi variable distribution
Alternatives for Multiple Distribution from Fundamentals of Data Visualization by Wilke